Fault tolerant cell array architecture

ABSTRACT

A data processing system containing a monolithic network of cells with sufficient redundancy provided through direct logical replacement of defective cells by spare cells to allow a large monolithic array of cells without uncorrectable defects to be organized, where the cells have a variety of useful properties. The data processing system according to the present invention overcomes the chip-size limit and off-chip connection bottlenecks of chip-based architectures, the von Neumann bottleneck of uniprocessor architectures, the memory and I/O bottlenecks of parallel processing architectures, and the input bandwidth bottleneck of high-resolution displays, and supports integration of up to an entire massively parallel data processing system into a single monolithic entity.

This application is a continuation of U.S. application Ser. No.10/000,813, filed on Nov. 30, 2001, now U.S. Pat. No. 6,636,986 entitled“Output and/or Input Coordinated Processing Array”, which is acontinuation of U.S. application Ser. No. 09/679,168, filed on Oct. 4,2000, now U.S. Pat. No. 6,408,402 entitled “Efficient Direct ReplacementCell Fault Tolerant Architecture”, which is a continuation of U.S.application Ser. No. 09/376,194, filed on Aug. 18, 1999, now U.S. Pat.No. 6,154,855 entitled “Efficient Direct Replacement Cell Fault TolerantArchitecture”, which is a continuation of U.S. application Ser. No.08/821,672, filed on Mar. 19, 1997, now U.S. Pat. No. 6,038,682 entitled“A Fault Tolerant Data Processing System Fabricated on a MonolithicSubstrate”, which is a continuation of U.S. application Ser. No.08/618,397, filed on Mar. 19, 1996, now U.S. Pat. No. 5,748,872 entitled“Efficient Direct Replacement Cell Fault Tolerant Architecture”, whichis a continuation of U.S. application Ser. No. 08/216,262, filed on Mar.22, 1994, now abandoned entitled “Efficient Direct Replacement CellFault Tolerant Architecture”.

FIELD OF THE INVENTION

The present invention relates to improvements in data processingsystems. More particularly, the invention is directed to eliminatingperformance bottlenecks and reducing system size and cost by increasingthe memory, processing, and I/O capabilities that can be integrated intoa monolithic region.

BACKGROUND OF THE INVENTION

Early computer circuits were made of separate components wired togetheron a macroscopic scale. The integrated circuit combined all circuitcomponents (resistors, capacitors, transistors, and conductors) onto asingle substrate, greatly decreasing circuit size and power consumption,and allowing circuits to be mass produced already wired together. Thismass production of completed circuitry initiated the astoundingimprovements in computer performance, price, power and portability ofthe past few decades. But lithographic errors have set limits on thecomplexity of circuitry that can be fabricated in one piece withoutfatal flaws. To eliminate these flaws large wafers of processedsubstrate are diced into chips so that regions with defects can bediscarded. Improvements in lithography allow continually increasinglevels of integration on single chips, but demands for more powerful andmore portable systems are increasing faster still.

Portable computers using single-chip processors can be built on singlecircuit boards today, but because lithographic errors limit the size andcomplexity of today's chips, each system still requires many separatechips. Separate wafers of processor, memory, and auxiliary chips arediced into their component chips, a number of which are thenencapsulated in bulky ceramic packages and affixed to an even bulkierprinted circuit board to be connected to each other, creating a systemmany orders of magnitude bigger than its component chips. Using separatechips also creates off-chip data flow bottlenecks because the chips areconnected on a macroscopic rather than a microscopic scale, whichseverely limits the number of interconnections. Macroscopic inter-chipconnections also increase power consumption. Furthermore, even singleboard systems use separate devices external to that board for systeminput and output, further increasing system size and power consumption.The most compact systems thus suffer from severe limits in battery life,display resolution, memory, and processing power.

Reducing data traffic across the off-chip bottleneck and increasingprocessor-to7 memory connectivity through adding memory to processorchips is known in the art. Both Intel's new Pentium (tm) processor andIBM/Motorola/Apple's PowerPC (tm) 601 processor use 256-bit-wide datapaths to small on-chip cache memories to supplement their 64-bit widepaths to their systems' external-chip main memories (“RISC DrivesPowerPC”, BYTE, August 1993, “Intel Launches a Rocket in a Socket”,BYTE, May 1993). Chip size limits, however, prevent the amount ofon-chip memory from exceeding a tiny fraction of the memory used in awhole system.

Parallel computer systems are well known in the art. IBM's 3090mainframe computers, for example, use parallel processors sharing acommon memory. While such shared memory parallel systems do remove thevon Neumann uniprocessor bottleneck, the funnelling of memory accessfrom all the processors through a single data path rapidly reduces theeffectiveness of adding more processors. Parallel systems that overcomethis bottleneck through the addition of local memory are also known inthe art. U.S. Pat. No. 5,056,000, for example, discloses a system usingboth local and shared memory, and U.S. Pat. No. 4,591,981 discloses alocal memory system where each “local memory processor” is made up of anumber of smaller processors sharing that “local” memory. But in thesesystems the local processor/memory clusters contain many separate chips,and while each processor has its own local input and output, that inputand output is done through external devices. This requires complexmacroscopic (and hence off-chip-bottleneck-limited) connections betweenthe processors and external chips and devices, which rapidly increasesthe cost and complexity of the system as the number of processors isincreased.

Massively parallel computer systems are also known in the art. U.S. Pat.Nos. 4,622,632, 4,720,780, 4,873,626, and 4,942,517, for instance,disclose examples of systems comprising arrays of processors where eachprocessor has its own memory. Even massively parallel computer systemswhere separate sets of processors have separate paths to I/O devices,such as those disclosed in U.S. Pat. Nos. 4,591,980, 4,933,836 and4,942,517 and Thinking Machines Corp.'s CM-5 Connection Machine (tm),rely on connections to external devices for their input and output(“Machines from the Lunatic Fringe”, TIME, Nov. 11, 1991). Having eachprocessor set connected to an external I/O device also necessitateshaving a multitude of connections between the processor array and theexternal devices, thus greatly increasing the overall size, cost andcomplexity of the system.

Multi-processor chips are also known in the art. U.S. Pat. No.5,239,654, for example, calls for “several” parallel processors on animage processing chip. Even larger numbers of processors arepossible—Thinking Machines Corp.'s original CM-1 Connection Machine, forexample, used 32 processors per chip to reduce the numbers of separatechips and off-chip connections needed for (and hence the size and costof) the system as a whole (U.S. Pat. No. 4,709,327). The chip-sizelimit, however, forces a severe trade-off between number and size ofprocessors in such architectures; the cm-1 chip used 1-bit processorsinstead of the 8-bit to 32-bit processors in common use at that time.But even for massively parallel tasks, trading one 32-bit processor perchip for 32 one-bit processors per chip does not produce any performancegains except for those tasks where only a few bits at a time can beprocessed by a given processor. Furthermore, these non-standardprocessors do not run standard software, requiring everything fromoperating systems to compilers to utilities to be re-written, greatlyincreasing the expense of programming such systems. Newer massivelyparallel systems such as the CM-5 Connection Machine use standard 32-bitfull-chip processors instead of multiprocessor chips.

Input arrays are also known in the art. State-of-the-art video cameras,for example, use arrays of charge-coupled devices (LCD's) to gatherparallel optical inputs into a single data stream. Combining an inputarray with a digital array processor is disclosed in U.S. Pat. No.4,908,751, with the input array and processor array being separatedevices and the communication between the arrays being shown asrow-oriented connections, which would relieve but not eliminate theinput bottleneck. Input from an image sensor to each processing cell ismentioned as an alternative input means in U.S. Pat. No. 4,709,327,although no means to implement this are taught. Direct input arrays thatdo analog filtering of incoming data have been pioneered by carver Mead,et al., (“The Silicon Retina”, Scientific American, May 1991). The sizesof these arrays are also limited by lithographic errors, so systemsbased on such arrays are subjected to the off-chip data flow bottleneck.

Output arrays where each output element has its own transistor are alsoknown in the art and have been commercialized for flat-panel displays,and some color displays use display elements with one transistor foreach color. Since the output elements cannot add or subtract oredit-and-pass-on a data stream, such display elements can do no datadecompression or other processing, so the output array requires a singleuncompressed data stream, creating a band-width bottleneck as array sizeincreases. These output arrays also have no defect tolerance, so everypixel must be functional or an obvious “hole” will show up in the array.This necessity for perfection creates low yields and high costs for suchdisplays.

Systems that use wireless links to communicate with external devices arealso known in the art. Cordless data transmission devices, includingkeyboards and mice, hand-held computer to desktop computer data links,remote controls, and portable phones are increasing in use every day.But increased use of such links and increases in their range and datatransfer rates are all increasing their demands for bandwidth. Someelectromagnetic frequency ranges are already crowded, making thistransmission bottleneck increasingly a limiting factor. Powerrequirements also limit the range of such systems and often require thetransmitter to be physically pointed at the receiver for reliabletransmission to occur.

Fault-tolerant architectures are also known in the art. The mostsuccessful of these are the spare-line schemes used in memory chips.U.S. Pat. Nos. 3,860,831 and 4,791,319, for example, disclose spare-lineschemes suitable for such chips. In practice, a 4 megabit chip, forexample, might nominally have 64 cells each with 64 k active bits ofmemory in a 256×256 bit array, while each cell physically has 260 bitsby 260 bits connected in a manner that allows a few errors per cell tobe corrected by substituting spare lines, thus saving the cell. Thisallows a finer lithography to be used, increasing the chip's memorydensity and speed. Since all bits in a memory chip have the samefunction, such redundancy is relatively easy to implement for memory.Processors, however, have large numbers of circuits with uniquefunctions (often referred to in the art as random logic circuits), and aspare circuit capable of replacing one kind of defective circuit cannotusually replace a different kind, making these general spare-circuitschemes impractical for processors.

Redundancy schemes that handle random logic circuits by replicatingevery circuit are also known in the art. These incorporate means forselecting the output of a correctly functioning copy of each circuit andignoring or eliminating the output of a faulty copy. Of thesereplication schemes, circuit duplication schemes, as exemplified by U.S.Pat. Nos. 4,798,976 and 5,111,060, use the least resources forredundancy, but provide the least protection against defects because twodefective copies of a given circuit (or a defect in their joint outputline) still creates an uncorrectable defect. Furthermore, it isnecessary to determine which circuits are defective so that they can bedeactivated. Many schemes therefore add a third copy of every circuit sothat a voting scheme can automatically eliminate the output of a singledefective copy. This, however, leads to a dilemma: When the voting isdone on the output of large blocks of circuitry, there is a significantchance that two out of the three copies will have defects, but when thevoting is done on the output of small blocks of circuitry, many votingcircuits are needed, increasing the likelihood of errors in the votingcircuits themselves) Ways to handle having two defective circuits out ofthree (which happens more frequently than the 2 defects out of 2 problemthat the duplication schemes face) are also known. One tactic is toprovide some way to eliminate defective circuits from the voting, asexemplified by U.S. Pat. No. 4,621,201. While this adds a diagnosticstep to the otherwise dynamic voting process, it does allow a tripletwith two defective members to still be functional. Another tactic, asexemplified by U.S. Pat. Nos. 3,543,048 and 4,849,657, calls for N-foldreplication, where N can be raised to whatever level is needed toprovide sufficient redundancy. Not only is a large N an inefficient useof space, but it increases the complexity of the voting circuitsthemselves, and therefore the likelihood of failures in them. Thisproblem can be reduced somewhat, although not eliminated, by minimizingthe complexity of the voting circuits, as U.S. Pat. No. 4,617,475 doesthrough the use of an analog differential transistor added to eachcircuit replicate, allowing a single analog differential transistor todo the voting regardless of how many replicates of the circuit thereare. Yet another tactic is to eliminate the “voting” by replicatingcircuits at the gate level to build the redundancy into the logiccircuit themselves. U.S. Pat. No. 2,942,193, for example, calls forquadruplication of every circuit, and uses an interconnection schemethat eliminates faulty signals within two levels of where theyoriginate. While this scheme can be applied to integrated circuits(although it predates them considerably), it requires four times as manygates, each with twice as many inputs, as equivalent non-redundantlogic, increasing the circuit area and power requirements too much to bepractical. All these N-fold redundancy schemes also suffer from problemswhere if the replicates are physically far apart, gathering the signalsrequires extra wiring, creating propagation delays, while if thereplicates are close together, a single large lithographic error canannihilate the replicates en masse, thus creating an unrecoverablefault.

Cell-based fault-tolerant architectures are also known in the art. U.S.Pat. Nos. 3,913,072 and 5,203,005, for example, both disclosefault-tolerant schemes that connect whole wafers of cells into singlefault-free cell chains, even when a significant number of the individualcells are defective. The resulting one-dimensional chains, however, lackthe direct addressability needed for fast memory arrays, the positionalregularity of array cells needed for I/O arrays, and the two-dimensionalor higher—neighbor-to-neighbor communication needed to efficientlyhandle most parallel processing tasks. This limits the usefulness ofthese arrangements low or medium performance memory systems and to tasksdominated by one-dimensional or lower connectivity, such as sortingdata. U.S. Pat. No. 4,800,302 discloses a global address bus based sparecell scheme that doesn't support direct cell-to-cell connections at all,requiring all communications between cells to be on the global bus.Addressing cells through a global bus has significant drawbacks; it doesnot allow parallel access of multiple cells, and comparing the cell'saddress with an address on the bus introduces a delay in accessing thecell. Furthermore, with large numbers of cells it is an inefficient userof power; in order for N cells to determine whether they are beingaddressed, each must check a minimum of log2(N) address bits (in binarysystems), so an address signal requires enough power to drive N*log2(N)inputs. This is a high price in a system where all intercell signals areglobal.

Even cell-based fault-tolerant architectures that supporttwo-dimensional connectivity are known in the art. U.S. Pat. No.5,065,308 discloses a cell array that can be organized into a series offault-free linear cell chains or a two-dimensional array of fault-freecells with neighbor-to-neighbor connections. Several considerations,however, diminish its applicability to large high-performance array atall but the lowest defect densities. While the cells can be addressedthrough their row and column connections IPN->OPS and IPE->OPW, thisaddressing is not direct in that a signal passing from West to Eastencounters two 3-input gates per cell, (even assuming zero-delay passagethrough the processor itself). Thus while large cells create high defectrates, small cells sizes create significant delays in the propagation ofsignals across the array. Consider, for example, a wafer with 1 defectper square+ centimeter, which is reasonable for a leading edgeproduction technology. On a 5″ wafer an 80 square centimeter rectangulararray can be fabricated. Now consider what size cells might be suitable.With an 8 by 10 array of 1 cm square cells (less than half the size of aPentium chip) the raw cell yield would be around 30%, or an average of24 or 25 good cells. Only when every single column had at least one goodcell, and that spaced by at most one row from the nearest good cell ineach of the neighboring columns, could even a single 1×8 fault-free cell“array” could be formed. This should happen roughly 10% of the time, foran abysmal overall 1$ array cell yield. With wafer scale integration,however, smaller cell sizes are useful as the cells do not have to bediced and reconnected. As cell size decreases, yields grow rapidly, butthe propagation delays grow, too. With 5 mm square cells a 16×20 rawcell array would fit, and the raw cell yield would be almost 75%, somost arrays would have around 240 good cells.

While an average column would have 15 good cells, it is the column withthe fewest good cells that determine the number of rows in the finalarray. This would typically be 10 or 11 rows, creating 16×10 or 16×11arrays. This would be a 50%-55$ array cell yield, which is quitereasonable. But row-addressing signals propagated across the array wouldpass sequentially through up to 30 gates, creating far too long a delayfor high-performance memory systems.

This interconnection scheme also has problems when used for processingcells, although it is targeted for that use. The cell bypassing schemedoes support two-dimensional neighbor-to-neighbor connectivity, andcould support a column-oriented bus for each column, but it cannotsupport a corresponding row-oriented bus without the 2-gate-per-celldelay. Three dimensional connectivity could be accomplished only byextending the bypass scheme to physically three dimensional arrays,which cannot be made with current lithography, and higher-dimensionalconnectivities such as hyper-cube connectivity are out of the question.Even for two-dimensional neighbor-to-neighbor connectivity, this schemehas certain drawbacks.

While the row-oriented neighbor-to-neighbor connections never span adistance larger than one diagonal cell-center to cell-center,column-oriented neighbor-to-neighbor connections can be forced to spanseveral defective or inactive cells. All intercell timing and powerconsiderations must take into account the maximum capacitances andresistances likely to be encountered on such a path. This scheme alsoshifts the position of every cell in the entire rest of the column(relative to its same-logical-row neighbors) for each defective cellthat is bypassed, which propagates the effects of each defective cellfar beyond the neighborhood of the defect. This multi-cell shift alsoprevents this scheme from being useful in arrays where physical positionof array cells is important, such as direct input or output cell arrays.

Fully integrated systems have been disclosed in PCT/CA 82/00525 by thepresent inventor. The present application includes numerous improvementsto the art taught therein, including direct replacement of defectivecells by replicating every external connection, extending the faulttolerant scheme to larger cells, extending the fault tolerance to memoryarrays by providing direct addressing, fabrication of serial processorson the same substrate as the fault-tolerant arrays, the cooperativedynamic focusing of direct outputs and inputs, regional power sharingbusses, uses for unassigned spare cells to handle and accelerate serialprocessing, spare pixels within individual cells, and the redirectionand use of non-output photons to supplement or replace other powersources. It is to these improvements that the claims of the presentapplication are directed.

SUMARY OF THE INVENTION

It is therefore one object of the present invention to provide a highlyredundant network of cells that allows a large array of cells to beorganized from a monolithically fabricated unit, with at least moderateyields of defect-free arrays in spite of significant numbers ofdefective cells, where all array cells can be directly addressed andhave access to a global data bus, allowing the cell array to be used asa compact high-performance memory system.

It is another object of the present invention to provide a highlyredundant network of cells that allows a large array of cells to beorganized on a monolithically fabricated unit, with at least moderateyields of defect-free arrays in spite of significant numbers ofdefective cells, where all array cells have bi-directional communicationwith their neighboring array cells in at least 3 total dimensions (ofwhich least two dimensions are physical) allowing the cell array to beefficiently used as a parallel processing system on massively paralleltasks of 3-dimensional or higher connectivity.

It is another object of the present invention to provide a cell-basedfault-tolerant array containing sufficient redundancy to allow cellslarge enough to contain RISC (Reduced Instruction Set Computer) or CISC(Complex Instruction Set Computer) processors to be used whilemaintaining at least moderate yields on up to wafer-sized arrays.

It is further object of the present invention to provide a highlyparallel or massively parallel data processing system that reduces datacontention across the off-chip data bottleneck, and increases the numberand/or width of data paths available between processors and memories,through the integration of all main memory and all processors into asingle monolithic entity.

It is a further object of the present invention to provide anultra-high-resolution display containing a monolithic array of cellswhere each cell has optical direct output means, and memory and/orprocessing capacity in excess of that which the cell needs to manage itsdirect outputs, allowing the array to perform other functions for thesystem as a whole, and thus increasing the fraction of a monolithicallyfabricated system that can be devoted to the display.

It is also an object of the present invention to provide a dataprocessing system that dymamically focuses wireless transmissions toexternal devices to minimize bandwidth contention and power requirementsthrough monolithically integrated dynamically focusing phased arrays.

It is a further object of the present invention to provide a method forimplementing any and all of the aforementioned objects of the presentinvention in single thin ‘sheet.

In accordance with one aspect of the invention, there is thus provideda>> apparatus containing a monolithic redundant network of cells fromwhich a large defect-free array of cells can be organized, where eacharray cell can be directly Addressed and can receive and send datathrough a global data bus, allowing the combined memories of the arraycells to be used as a single monolithic high performance, high capacitymemory module.

In accordance with another aspect of the invention, there is thusprovided an apparatus containing a monolithic redundant network of cellsfrom which a large defect-free array of cells can be organized, whereeach array cell has direct bi-directional communication with its nearestneighbor cells in at least three total dimensions, at least two of whichare physical, enabling the array as a whole to efficiently processparallel tasks of three-dimensional or higher neighbor-to-neighborconnectivity.

In accordance with yet another aspect of the invention, there is thusprovided a data processing system containing a monolithic redundantnetwork of cells from which a large defect-free array of cells can beorganized, where the array cells have fault-tolerant direct inputsand/or direct outputs, and where spare cells have no direct I/O's oftheir own but use the direct inputs and outputs of the defective cells,allowing the surface of the network as a whole to be substantiallycovered with direct inputs and/or outputs in use by array cells, withoutsignificant defects in the continuity of those direct inputs and/oroutputs.

The present invention also provides, in another aspect thereof, a methodfor producing any of the above arrays of cells where the entire array isfabricated as a single thin sheet.

By the expression “fault tolerant” as used herein is meant the abilityto function correctly in spite of one or more defective components.

By the expression “data processing system” as used herein is meant asystem containing means for input from an external device (such as ahuman operator), means for memory, means for processing, and means foroutput to an external device (such as a human eye).

By the expression “defect-free array” as used herein is meant an arrayof cells where all defective array cells have been logically replaced bycorrectly functioning spare cells.

By the expression “highly parallel” as used herein is meant a problem, atask, or a system with at least 16 parallel elements.

By the expression “massively parallel” as used herein is meant aproblem, a task, or a system with at least 256 parallel elements.

By the expression “spare-line scheme” as used herein is meant a faulttolerant architecture that uses one or more spare rows and/or columns ofunits that can be used to logically replace one or more whole rowsand/or columns of units that contain defective units.

By the expression “direct replacement” is meant that when a unitreplaces a defective unit it interacts with the rest of system of whichthe units are a part in a manner logically identical to the way thedefective unit would have had it not been defective.

By the expression “array” as used herein is meant elements arranged in aregular pattern of two or three physical dimensions, or as a regular twodimensional pattern on the surface of a three dimensional shape.

By the expression “large array of cells” as used herein is meant anarray of cells that would, at the lithography with which it is made, andnot considering spare cells, contain on the average a plurality ofdefective cells.

By the expression “moderate yield” as used herein is meant a yield inexcess of 50$.

By the expression “high yield” as used herein is meant a yield in excessof 90%.

By the expression “extremely high yield” as used herein is meant a yieldin excess of 99$.

By the expression “single substrate system” as used herein is meant adata processing system of which all parts of are manufactured on asingle substrate.

By the expression “direct output means” as used herein is meant meansfor a given cell to send an output signal to a device outside the array(such as a human eye) without that output signal being relayed through aneighboring cell, through a physical carrier common to that cell andother cells, or through a separate external output device.

By the expression “direct input means” as used herein is meant means fora given cell to receive an input signal from a device outside the arraywithout that input signal being relayed through a neighboring cell,through a physical carrier common to that cell and other cells, orthrough a separate external input device.

By the expression “global input” as used herein is meant means for anindividual cell to pick up an input signal from a physical carriercommon to the cells, such as a global data bus.

By the expression “external output device” as used herein is meant anoutput device fabricated as a separate physical entity from the cellarray.

By the expression “external input device” as used herein is meant aninput device fabricated as a separate physical entity from the cellarray.

By the expression “complementary direct input means and direct outputmeans” as used herein is meant that the direct input means and directoutput means of two identical devices with such means could communicatewith each other through such means.

By the expression “means for communication with neighboring cells” asused herein is meant input means to receive a signal from at least oneneighboring cell and output means to send a signal to at least one otherneighboring cell without the signals being relayed through a carriershared with other array cells or through an external device.

By the expression “full color” as used herein is meant the ability todisplay or distinguish at least 50,000 different hues (approximately asmany shades as the average unaided human eye is capable ofdistinguishing).

By the expression “full motion video” as used herein is meant theability to display at least 50 frames per second (the approximate ratebeyond which the average unaided human eye notices no improvement invideo quality).

By the expression “macroscopic” as used herein is meant something largerthan the resolving power of the average unaided human eye, or largerthan 50 microns.

By the expression “microscopic” as used herein is meant somethingsmaller than the resolving power of the average unaided human eye, orsmaller than 50 microns.

By the expression “thin sheet” as used herein is meant a sheet whosetotal thickness is less than 1 millimeter.

By the expression “regional” as used herein is meant something common toor associated with a plurality of cells in a region of the network ofcells that is smaller than the entire network.

By the expression “directly addressable” as used herein is meant that acell can be addressed through a single off/on signal for each physicalarray dimension, without any of these addressing signals being relayedthrough other cells.

By the expression “total dimensions” as used herein is meant the numberof physical dimensions plus the number of logical dimensions; a 65,536processor CM-1 connection Machine computer, for example, has itsprocessors connected in a hypercube of 16 total dimensions, three ofwhich are physical and 13 of which are logical.

By the expression “physical connection” as used herein is meant aconnection that relies on physical contact or sub-micron proximity.

By the expression “monolithic” as used herein is meant a contiguousregion of a substrate.

By the expression “phased array” as used herein is meant an array whoseelements individually control the phase or timing of their component ofa signal that the array as a whole emits or receives.

By the expression “dynamic focusing” as used herein is meant a focusingprocess whose focal length and/or direction are not predetermined, butare adjusted during operation to focus on a device.

By the expression “N-fold replication” as used herein is meant that Nfunctionally identical copies of a given unit are fabricated for eachcopy of that unit that is needed an operational system.

By the expression “N-for-1 redundancy” as used herein is meant that inthe absence of errors any one of N units can fulfil the functions of agiven unit.

By the expression “physical neighbors” is meant that the minimumdistance between two cells is less than twice the width of a cell inthat direction.

The expression “could be produced with identical lithographic patterns”is used solely to describe the similarity of the structures and is notto be construed as limiting the invention to embodiments produced withlithography.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other objects, features and advantages of the invention willbe more readily apparent from the following detailed description of thepreferred embodiments of the invention in which:

FIG. 1A is a functional depiction of an array of processing cells withmeans for any of two spare cells to take over for any defective cell;

FIG. 1B is a functional depiction of an array of processing cells withmeans for any of three spare cells to take over for any defective cell;

FIG. 1C is a functional depiction of an array of processing cells withmeans for any of four spare cells to take over for any defective cell;

FIG. 1D is a functional depiction of another array of processing cellswith means for any of four spare cells to take over for any defectivecell;

FIG. 1E is a functional depiction of another array of processing cellswith means for any of eight spare cells to take over for any defectivecell;

FIG. 1F is a functional depiction of an array of processing cells withonly one spare cell for every three array cells, yet with means for anyof 3 spare cells to take over for any defective array cell;

FIG. 1G is a functional depiction of an array of processing cells withonly one spare cell for every eight array cells, yet with means for anyof two spare cells to take over for any defective array cell;

FIG. 1H is a functional depiction of an array of processing cells withonly one column of spare cells for every four columns of array cells,yet with means for any of three spare cells to take over for anydefective array cell;

FIG. 2 is a functional depiction of a spare cell that is able to respondto the address of any one of its four nearest neighbor array cellsshould it be used to replace one of those array cells;

FIG. 3 is a geometric depiction of a wafer with a memory array and a“mono-chip” CPU and other interface “chips”;

FIG. 4A is a functional depiction of an array cell with both processingand memory means in accordance with the invention;

FIG. 4B is a functional depiction of an array of such cells showingpaths from a spare cell that can replace either of two neighboring arraycells;

FIG. 4C is a functional depiction of an array of such cells showingpaths from a spare cell that can replace any of three neighboring arraycells;

FIG. 4D is a functional depiction of an array of such cells showingpaths from a spare cell that can replace any of four neighboring arraycells;

FIG. 4E is a functional depiction of an array of such cells showingalignment-insensitive contact means;

FIG. 5A is a functional depiction of an array of direct outputdata-decompression cells in accordance with the invention;

FIG. 5B is a functional depiction of one of the cells of FIG. 5A;

FIG. 6A is a functional depiction of an array of direct outputdata-decompression cells where the cells use neighbor-to-neighborcommunication instead of cell addresses and a global input;

FIG. 6B is a functional depiction of one of the cells of FIG. 6A;

FIG. 7A is a functional depiction of a spare cell capable of using thedirect outputs of any array cell it replaces;

FIG. 7B is a geometric depiction of the area occupied by the directoutputs of an array cell when a spare cell that may replace it will usethose direct outputs.

FIG. 8A is a functional depiction of the physical parts of a classicserial data processing system;

FIG. 8B is a functional depiction of the data flow of a classic serialdata processing system;

FIG. 8C is a functional depiction of the data flow of a classicmassively parallel data processing system;

FIG. 9A is a functional depiction of the physical parts of an integratedmassively parallel data processing system according to the presentinvention;

FIG. 9B is a functional depiction of the data flow of an integratedmassively parallel data processing system according to the presentinvention;

FIG. 10 is a functional depiction of an array cell with direct outputmeans and direct input means;

FIG. 11 is a geometric depiction of an array of processing cells usingtheir direct inputs and outputs to communicate with an external device;

FIG. 12 is a functional depiction of one processing cell with severalkinds of direct input and direct output;

FIG. 13 is a functional depiction of several cells using their directoutput means as a phased array to focus on an external receiver;

FIG. 14A is a geometric depiction of a direct I/O processing cell withits own power absorption and storage means; and

FIG. 14B is a geometric depiction of an array of direct I/O processingcells fabricated as a thin sheet composed of series of thin layers.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Direct Replacement Cell Fault Tolerant Architecture

Because lithographic errors limit the size of traditional chips,chip-based computer architectures use many separate chips forprocessing, memory and input/output control. A number of these separateprocessor, memory, and auxiliary chips are encapsulated in bulky ceramicpackages and affixed to even bulkier printed circuit boards to connectto each other. A svelte processor chip like IBM/Apple/Motorola's PowerPC601, for example, uses a ceramic holder 20 times its own size to allowit to be connected to a still-larger circuit board. While each chip usewires fabricated on a microscopic scale (on the order of 1 micron)internally, the board-level interconnections between the chips use wiresfabricated on a macroscopic scale (on the order of 1 millimeter, or 1000times as wide). Because of this chip-based architectures not only sufferfrom the expense of dicing wafers into chips then packaging andinterconnecting those chips, and the corresponding bulk this creates,but also from limits in the number of connections that can be madebetween any given chip and the rest of the system once the chip-sizelimit is exceeded, the number of possible connections to the rest of thesystem drops by over 3 orders of magnitude, and the power required todrive each connection climbs markedly.

Several attempts to extend or overcome this lithographic chip-size-limitare known in the prior art. For small highly repetitive circuits,generic replacement fault tolerant schemes are useful. The mostcommercially successful of these is the fabrication of extra bit andword lines on memory chips. A 4 megabit chip, for example, mightnominally be composed of 64 cells of 64 k-bits each, while in order toincrease the likelihood of having all 64 cells functional, each cellphysically has 260 bit lines and 260 word lines instead of the 256×256that are needed for 64 k bits. The spare lines are connected to thestandard lines through a complex series of fuses so that they can act asdirect replacements for individual faulty lines. This line-levelredundancy allows a cell to recover from a few faulty bits, so a finerlithography more prone to small lithographic errors can be used Withoutreducing the chip size limit. But large lithographic errors can spanmany lines, and this redundancy scheme does nothing to address sucherrors, so the overall chip size limit is not increased much.Furthermore, generic replacement fault tolerant schemes such as this donot support two-dimensional or higher neighboring unit to neighboringunit connectivity, and only work with small, highly repetitive circuits.Processors have large numbers of random logic circuits, and a sparecircuit capable of replacing one kind of defective circuit cannotusually replace a different kind, making such general spare-circuitschemes impractical for processors.

Redundancy schemes that handle random logic circuits by replicatingevery circuit are also known in the art. These incorporate means forselecting the output of a correctly functioning copy of each circuit andignoring or eliminating the output of a faulty copy. Of thesereplication schemes, circuit duplication schemes use the least resourcesfor redundancy, but can be disabled by two defective copies of a singlecircuit or a single defect in their joint output line. Many schemestherefore add a third copy of every circuit so that a voting scheme canautomatically eliminate the output of a single defective copy. This,however, leads to a dilemma: When the voting is done on the output oflarge blocks of circuitry, there is a significant chance that two out ofthe three copies will have defects, but when the voting is done on theoutput of small blocks of circuitry, many voting circuits are needed,increasing the likelihood of errors in the voting circuits themselves!Ways to handle having two defective circuits out of three (which occursmore frequently than the two-defects-out-of-two problem that theduplication schemes face) are also known. One tactic is to provide someway to eliminate defective circuits from the voting. While this does adda diagnostic step to the otherwise dynamic voting process, it does allowa triplet with two defective members to still be functional. Anothertactic calls for N-fold replication, where N can be raised to whateverlevel is needed to provide sufficient redundancy. Not only is a large Nan inefficient use of space, but it increases the complexity of thevoting circuits themselves, and therefore the likelihood of failures inthem. This problem can be reduced somewhat by minimizing the complexityof the voting circuits (through analog circuits, for example), oreliminated at great expense in circuit area and power through gate-levelN-fold redundancy. Also, when these N-fold schemes use small units toenable a lower value of N to be used, a problem arises where if thereplicates are physically far apart, gathering the signals requiressignificant extra wiring, creating propagation delays; while if thereplicates are close together, a single large lithographic error canannihilate the replicates en masse, thus creating an unrecoverablefault.

Cell-based fault-tolerant architectures other than N-fold replicationare also known in the art, but they do not support some of the mostimportant features for general data processing—the direct addressabilityneeded for fast memory arrays, the positional regularity of array cellsneeded for I/O arrays, and the higher than two-dimensionalneighbor-to-neighbor communication needed to efficiently handle manyreal-world parallel processing tasks.

Accordingly, the fault tolerant data processing architecture accordingto one embodiment of the present invention overcomes this chip-sizelimit bottleneck with a monolithic network of cells with sufficientredundancy that a large fault-free array of cells can be organized wherethe array cells have a variety of attributes useful for data processing,including the direct addressability needed for fast memory arrays, thepositional regularity of array cells needed for I/O arrays, and thehigher than two-dimensional neighbor-to-neighbor communication needed toefficiently hale many real-world parallel processing tasks, and providesspare cells within the network interconnected in such a manner that aplurality of spare cells can directly replace the functions of any givenarray cell should that array cell prove defective, without the overheadof a plurality of dedicated replacements for each cell. This can beachieved by providing each spare cell with the ability to serve as adirect replacement for any one of a plurality of potentially defectiveneighboring array cells, in such a manner that the spare cells'replacement capabilities overlap. In this way an exceptional level ofredundancy, and hence extremely high fault tolerance, can be providedrelatively from few spare cells. The simplest way for a spare cell toserve as a direct replacement for an array cell is for the spare cell tohave identical internal functions, or a superset thereof, and to havedirect replacements for every connection the array cell uses in normaloperation has (it is possible to have “spare” cells and “array” cells beidentical, although when a given spare cell can replace any one of aplurality of array cells this requires that some of the connections beidle in normal operation as an array cell). FIG. 1A shows an example ofsuch an interconnection scheme where the network 10 of cells contains acolumn of spare cells 100′ for every two columns of array cells 100:From a spare cell's point of view, each spare cell (except those on theedges of the array) can take over for any one of its four nearestneighbor array cells, while from an array cell's point of view, thereare two spare cells that can take over for any given defective arraycell. In FIG. 1B, three spare cells are able to replace any defectivearray cell; while in FIG. 1C, four nearest neighbor spare cells can takeover for any given defective array cell (this can also be done with acheckerboard pattern of array cells and spare cells, as shown in FIG.1D).

This type of scheme creates an extremely error-tolerant system, which isof critical importance in allowing a large array of cells to befabricated as a single unit. When pushing the limits of lithography itis not uncommon to average 200 errors per 5″ wafer. Under suchconditions an implementation that allows any of three spare cells totake over for any defective cell will increase yields of a full-wafernetwork with 1000 cells per square inch from near zero to over 99.99%.For larger cells, such as those containing RISC or CISC processors, the5-for-1 schemes of FIGS. 1C and 1D provides sufficient redundancy forsimilar yields for wafer-sized arrays of cells up to a few millimeterson a side even with error-prone leading edge lithography. With cellsinterconnected on a microscopic level there is no off-chip bottleneck tolimit intercell connections, so this spare cell scheme can easily beextended to provide more redundancy by providing the ability for eachspare cell to replace array cells in a wider area should one of thosearray cells prove defective. As the raw cell yield drops, however, it isnecessary to add a rapidly increasing percentage of spare cells to thenetwork to avoid spare-cell depletion. A 9-for-1 spare cell scheme whereonly ¼ of the cells are array cells, as shown in FIG. 1E, can maintainat least moderate array yields with raw cell yields as low as 50$ on a64-cell array.

Because all intercell connections are at a microscopic level, andbecause replacement cells are physically close to the cells they canreplace, cells can devote enough interconnections to redundancy tosupport N-for-1 replacement schemes where N is very large. For a givenarrangement of spare and array cells, the average distance from a cellto a spare cell that can replace it in a two dimensional N-for-1replacement scheme is approximately proportional to the square root ofN. For row and column direct addressing, row and column data busses,etc., the number of paths a spare cell needs in an N-for-1 replacementscheme also grows approximately with the with the square root of Nbecause with large N's more of the cells it can replace will lie on thesame row or column. For arrays with direct interprocessorcommunications, the number of paths per spare cell is proportional to Nbecause dedicated paths are used to each cell. Even when both types ofconnections are used, N can be very large. A Pentium-sized cell, forexample, has a circumference of over 60,000 microns, and a leading edge(0.5 micron, 5 metal layer) production line can easily devote 2 metallayers to redundancy. This allows a Pentium-sized cell to have 48064-bit-wide paths across it in the redundancy layers. Atypical arraycell might use 4 such row/column paths for row/column addressing andbusses, and 6 cell-cell paths for neighbors in a three dimensional (twophysical, one logical) neighbor-neighbor network. The spare cellconnections would take approximately 4*N+6*N*sqrt(N/2) equivalent paths,allow N to be as large as 20 or so for Pentium-sized cells with today'slithography, even with 64-bit interconnections throughout. This wouldtheoretically support raw cell yields down to 20% for an 8-to-1spare/array cell ratio, or even down to 10% with a 15-to-1 spare/arraycell ratio, with reasonable yields of defect free arrays. But becauselow raw-cell yields decrease the percentage of the wafer area used bygood cells, and because monolithic architectures can use smaller cellsthan chip-based architectures due to the elimination of dicing andreconnecting, it is expected that in practice cell sizes will be pickedrelative to lithographic error rates to keep raw cell yields above 90$in most cases and above 50% in virtually all cases.

Cells can be extremely small, with a practical lower limit set by thefrequency of large lithographic errors. Because small cells have highraw cell yields, low-N redundancy schemes are optimal. Errorssignificantly larger than a single cell can wreak havoc with suchredundancy schemes, so a reasonable lower limit for cell diameter is theaverage length of the longest fault in a region the size of the finalarray. While simply reversing the patterns of spare and array cells froma high-N schemes (such as that shown in FIG. 1E) produces extremelyfault tolerant systems from few spare cells, some modifications can bebeneficial in obtaining maximum fault tolerance and usefulness of arraycells. In FIG. 1F, for example, some array cells (example cell markedwith a ′) have four neighboring spare cells, while other array cells(example cell marked with a ″) have only two neighboring spare cells.This can be balanced by shifting some of each spare cell's replacementcapability from neighboring cells to next-to-neighbor cells, as shownFIG. 1F, so that each array cell has three spare cells that can replaceit. This provides 4-for-1 redundancy from having only one third as manyspare cells as array cells in the network, whereas a classic 4-foldreplication redundancy scheme would require 3 times as many spare cellsas array cells. For cells with extremely high raw cell yields, schemessuch as that shown in FIG. 1G provide 3-for-1 redundancy from only ⅛ asmany spare as array cells. A problem arises, however, when thesesparse-spare schemes are applied to either memory or direct displaycells, in that the pattern of array cells is not a regular rectangulararray. A column (or row) oriented sparse-spare scheme such as that shownin FIG. 1H provides as much redundancy from a similar number of sparecells as does the scheme of FIG. 1F, but it leaves the array cells in aregular rectangular array suitable for both directly addressable memorycells and direct display cells, and is thus preferable even though theaverage distance between a spare cell and the array cells it can replaceis slightly longer and grows slightly faster as the scheme is extendedto even sparser arrays. For lithographies with high rates of smallerrors, embodiments can use intra-cell redundancies, such as addingspare bit and word lines to a cell's memory in a manner identical to astandard memory chip's spare lines, so that a cell can tolerate a fewdefective bits without even requiring a spare cell to be brought intoplay.

Embodiments can also include means for the array to be self testing. Onesimple technique is to have all cells run a test routine that exercisesevery instruction, with the array locating defective cells by having,each cell comparing its results with all of its neighbors. Unless thesame error occurs in the majority of cells in a region, the most coonresult in every region will be that from correctly functioning cells.Further embodiments can provide means for cells that test valid to voteto assassinate a defective neighbor by disconnecting its power supply.Disconnecting defective cells from their power supplies allows simple‘OR’ gates to be used to combine paths from array and potential sparecells, as defective cell outputs will be forced to zero. Having separatemeans for a cell to be able to disconnect itself from potter providesredundancy by preventing any single error from keeping a defective cellalive. Further embodiments provide means for the cells to automaticallyselect a spare cell to replace any defective array cell. An algorithmcan be as simple as just starting at one corner and working toward theopposite corner and, for every defective array cell, starting back atthe original corner and searching for the first non-defective spare cellthat can replace the defective array cell. A more sophisticated schemecould map out the defective cell density surrounding each cell, andreplace defective array cells starting with the one with highestsurrounding defect density and proceeding toward that with the lowest.For each defective array cell, the spare cells that could replace itwould have their surrounding defect densities checked and the one withthe lowest surrounding defect density would be chosen. Due to the highfault tolerance of the current invention, algorithms that investigatemultiple patterns of cell replacement are not expected to be needed,although such schemes could be adapted from existing fault tolerantarchitectures or from circuit-routing-software.

In traditional chip-based architectures the use of macroscopicinterconnections between chips limits the number of connections that canbe made between any given chip and the rest of the system, creating anoff-chip data flow bottleneck. As processor clock speeds have increasedfaster than main memory chip speeds (“New Memory Architectures to BoostPerformance”, BYTE, July 1993), and as processor chips use increasingnumbers of processing pipelines to increase their overall speed, theaccess to off-chip main memory has started becoming a limiting factor inperformance (“Fast Computer Memories”, IEEE Spectrum, October 1992). Toreduce the need for communication across this bottleneck, new processorschips such as Intel's Pentium, Apple/IBM/Motorola's PowerPC 601, KIPS'4400, and Digital's Alpha AXP (tm) processors all include large on-chipcache memories (“A Tale of Two Alphas”, BYTE, December, 1993). Thisallows most memory accesses to be fulfilled through wide on-chip datapaths (256 bits wide for the PowerPC and Pentium) instead of thenarrower (32 or 64 bits wide) data paths to off-chip main (RAM) memory.But the amount of on-chip memory that can be added to traditionalchip-based processors is small compared to the overall main memory usedin such systems. Bulky, expensive multi-chip path-width-limited mainmemories are still necessary in these architectures.

To free up more connections from the processor chip to the rest of thesystem in order to support a wider path to the main memory, adual-ported main memory can be used to allow the processor and videosubsystem to access the memory independently. This allows the processorto have control-only connections to the video subsystem, as the videosubsystem can get its display data directly from the memory instead offrom the processor, thus freeing up connections otherwise used totransfer video data from the processor chip. If these paths are thenused to create a wider path to the main memory, the processor to memoryaccess bottleneck can be temporarily relieved. Unfortunately forchip-based architectures, with both the processor and the videosubsystem having separate paths to the memory, and with wider pathsbeing used, such a solution requires greatly increasing the number ofconnections to BACK memory chip, which significantly increases the sizeand cost of the memory subsystem. If the individual memory chips couldbe made larger, fewer of them would be needed, and hence the total sizeand cost of the memory subsystem would be reduced or the number andwidth of paths to it increased. But high-capacity memory chips alreadypush manufacturing capabilities; if a chip gets a 50% yield, a similarchip twice the size gets a 0.5×0.5 or 25% yield, and a chip four timesthe size gets a 0.5×0.5×0.5×0.5, or 6% yield.

Accordingly, the fault tolerant monolithic data processing architecturein a preferred embodiment of the present invention overcomes the memoryaccess bottleneck with a highly redundant monolithic network of memorycells that can be organized into a large fault-free array of cells, eachof which can be directly addressed and can send and receive data via aglobal data bus. In the highly redundant network from which the array isformed, as shown in FIG. 2, the network 20 of cells contains directlyaddressable array cells 200 and spare cells 200′ interconnected in sucha manner that should any array cell prove defective, at least two sparecells are capable of taking over its functions (for clarity, connectionsfrom only one spare cell are shown in FIG. 2). In order for a givenspare cell to take over for a given array cell in this embodiment, itmust be able to be directly addressed as if it were that array cell, andyet not to respond to requests for any other array cell which it couldhave replaced. Further embodiments use techniques that minimize thepower consumption and capacitance effects of unused connections, such asconnecting a cell to multiple address lines and severing connections tounused lines through means such as those used to customizefield-programmable gate arrays.

Although each cell could theoretically have only a single bit of memory,the power required in addressing a bit within a cell grows linearly withthe number of rows plus columns of cells in the array, but only with thelog (base 2 for binary systems) of the number of bits in each cell.Practical considerations thus dictate cells with at least 256 bits, andpreferably more, for use in low-power, high performance memory systems,with an upper size limit set by lithographic error rates. In practicememory-only cells according to the present architecture are expected tointernally resemble the cells on current memory chips, which typicallyhave 64 k bits per cell. Using direct addressing of cells in such anarray allows each cell's memory to be used as part of a global memorywithout the performance loss of indirect addressing or sending datathrough other cells. Thus the array as a whole can be used as a compacthigh-performance monolithic memory system. Using the same lithographyused for today's 16 megabit chips, this embodiment can pack a gigabit,or over 100 megabytes, onto a single monolithic region that can befabricated on a 6″ wafer.

Not only is such an array more compact and less expensive than using theup to 60 or so individual memory chips it replaces, but having amonolithic memory module allows as wide and as many data paths to beconnected to it as the rest of system will support. This can allow botha processor and a video subsystem to have independent wide paths to thesame memory, for example. Memory cells and arrays using the architecturedisclosed in the present invention can also use recent advances inchip-based memory architectures, such as fast on-chip SRAM caches,synchronous DRAMS, and RAMBUS's fast data transfer RDRAMs, and evenexotic advances such as the IEEE's RamLink architecture (“FastInterfaces for DRAMS”, “A New Era of Fast Dynamic RAMs”, “A Fast Path toOne Memory” and “A RAM Link for High speed”, IEEE Spectrum, October,1992).

The off-chip bottleneck of chip oriented architectures is likely tocontinue to worsen. Microscopic and macroscopic manufacturing improve atroughly the same rate, but doubling the capability of both allows fourtimes as many circuits to be placed within a given chip's area, whileonly doubling the number of connections that can be made around itscircumference. The 0.6 micron lithography of the Hips 84400 processorchip, for example, creates such compact circuitry that the chip actuallyhas an empty region around the processor core to make the overall chipbig enough to support all its macroscopic connections to the rest of thesystem (“Kips Processors to push Performance and Price”, ElectronicProducts, December, 1992). The largest single consumer of these off-chipdata paths with today's processors is access to off-chip memory.

Accordingly, the fault tolerant monolithic data processing architecturein another embodiment of the present invention as shown in FIG. 3combines one or more standard “mono-chip” RISC or CISC processors 380fabricated on the same monolithic substrate 390 with the monolithicmemory array 30 of memory cells 300 as described in the previous directaccess memory embodiment of the present invention. While this willreduce the overall yield to the array's yield times that of theprocessor(s), it keeps all the processor/memory interconnections on amicroscopic scale on a single monolithic region. This leaves the entirecircumference of the whole region, which is considerably larger thanthat of a single chip, free for connections to other subsystems. Usingthis embodiment one can reduce the entire memory and processorsubsystems of an advanced desktop system (such as a 486 with 16megabytes of main memory) to a single credit-card sized module. It isanticipated that arrays with defective processors can have thoseprocessors disabled and still be used as memory-only arrays, and thatother functions, such bios chips 3801, video accelerators 380″, or I/Ocontrollers 380′″ could be integrated in addition to or instead of, theprocessors(s).

The use of single processors is itself increasingly a bottleneck whiledramatic performance improvements have come about through thefabrication of ever smaller components and ever more complex chips, thedemand for compute power has increased faster still. But developingfaster processors is not the only way to increase processing power forsuch tasks. Instead of using one processor, parallel processingarchitectures use many processors working simultaneously on the sametask. Multi-processor systems with several processors sharing a commonmemory have dominated the mainframe and supercomputer world for manyyears, and have recently been introduced in desk-top computers. Whilethese parallel computer systems do remove the von Neumannsingle-processor bottleneck, the funnelling of memory access of manyprocessors through a single data path rapidly reduces the effectivenessof adding more processors, especially when the width of that path islimited by the off-chip data flow bottleneck. Most massively parallelarchitectures solve this multi-processor memory contention by havinglocal memory associated with each processor. Having more than oneprocessor chip, however, adds inter-processor communications to thealready crowded off-chip data flow, intensifying pressure on theoff-chip bottleneck.

Accordingly, the fault tolerant monolithic data processing architecturein another embodiment of the present invention overcomes this bottleneckwith a highly redundant network of cells containing both memory andprocessors that can be organized into a regular fault-free array ofcells, thus integrating a complete highly parallel or even massivelyparallel processing array and its local memories into a singlemonolithic entity. Preferred embodiments include means for the cells tocommunicate through a global data bus, and means for the cells to bedirectly addressed. This allows the combined memories of the cells toact as a shared main memory for the processor array as a whole whenprocessing a serial task, and still allows the array to be alocal-memory parallel processing array when processing parallel tasks. Aglobal bus is also exceptionally useful for communicating instructionsto the processors when operating in SIND (Single Instruction, MultipleData) mode, or for data when in MISD (Multiple Instruction, Single Data)mode. Such embodiments are ideally suited for use as a parallelprocessing graphics accelerator. Further embodiments include means forusing an array cell's registers and/or local cache memory as a cache foranother processor's access to that cell's memory, as SRAM cache is nowused on fast DRAM chips to boost their performance.

While an array of cellular processing elements which communicate solelythrough a global data bus is efficient at solving action-at-a-distanceparallel computing problems such as galactic evolution, where every starexerts a gravitational pull on every other, most parallel processingtasks involve higher degrees of connectivity. Because of this mostparallel data processing systems use a higher degree of connectivitybetween their processors. For small numbers of processors, a “star”configuration, where every processor has direct connections to everyother processor, is most efficient. But as the number of processorsgrows, the number of connections to each processor grows, too. Withtoday's technology a chip-based processor can devote no more than acouple of hundred connections to this, so with 32-bit wide data pathsthe off-chip bottleneck limits this scheme to at most a dozenprocessors. Even the monolithic architecture disclosed in the presentinvention can support less than a hundred processors in such aconfiguration when redundant paths are factored in. Because manymassively parallel tasks can exploit thousands of processors, mostmassively parallel architectures adopt a connectivity schemeintermediate between a single global bus andevery-processor-to-every-processor connections. The most prevalent ofthese is the “hypercube” connectivity used by Thinking Machines Corp. inits “Connection Machine” computer. But most massively parallel tasks,such as fluid dynamics, involve at most three dimensionalneighbor-to-neighbor interactions rather than random processor toprocessor connections, allowing simpler interconnection schemes to beefficiently employed.

When inter-cell connections are added to a given array cell,corresponding connections must be added to all spare cells capable ofdirectly replacing that array cell. When each spare cell can directlyreplace a number of array cells, the interconnection pattern grows quitecomplex. FIG. 4B shows the intercell connections needed for one arraycell and one spare cell in a network of array cells 400 and spare cells400′ where each array cell has connections to its four physical neighborarray cells, when using the 3-for-1 spare cell scheme of FIG. 1A. FIG.4C shows the corresponding interconnections when the 4-for-1 spare cellscheme from FIG. 1B is used, and FIG. 4D shows the correspondinginterconnections when the 5-for-1 spare cell scheme from FIG. 1C isused, which would be suitable for RISC processing cells up to a fewmillimeters on a side with today's lithography (only the connectionsfrom the top and left sides of one spare cell are shown for clarity inFIG. 4D; connections from the bottom and right can be deduced bysymmetry). FIG. 4D also includes a plurality of connections to some ofthe cells because the spare cell shown can replace one of a plurality ofneighbors of each of those cells; the patterns in FIGS. 4B and 4Crequire that distinguishing which neighbor of a given array cell a sparecell has replaced be handled internally by that array cell. Thesepatterns can be extended to higher-dimensional or even hypercube arrays,as long as each connection for each array cell has a correspondingconnection in each spare cell that can replace that array cell. Becausethe monolithic nature of the array allows over an order of magnitudemore connections to each processor than in a chip-based array, furtherembodiments can also provide row and/or column oriented addressing anddata busses in addition to neighbor-to-neighbor and global data busconnectivity. It is even possible to provide complete hypercubeconnectivity as well for those cases where it would improve efficiencyenough to be worth the added complexity.

Another embodiment for systems expected to run predominantly serialsoftware would include one or more fast serial processors fabricated onthe same monolithic substrate as the cell network (with the serialprocessors being disabled when defective). The cell array could act asfast memory for the serial processor for serial tasks, and as a parallelaccelerator for processing parallel tasks, such as sorting, searching,and graphics acceleration. Another embodiment would include means for aspare cell replacing a defective cell to copy that defective cell'smemory, enabling dynamic recovery from some post-manufacturing defects.

The commercial viability and speed of acceptance of a new dataprocessing architecture are greatly enhanced if systems based on the newarchitecture are compatible with existing software. Existing parallelsystems have no facilities for using multiple processors to speed up theprocessing of serial programs at less than an independent thread level.But with the architecture disclosed in the present invention, evenmassively parallel systems will be only slightly more expensive thanmono-processor systems of the same processor speed (instead of orders ofmagnitude more expensive), so they may often be used for serial tasks.Adding multiple-pipelines-per-processor, branch predictors, instructionprefetchers and decoders, etc., the approach used by high-end processorchips today, would greatly increase the cell size and decrease cellyield, reducing the number of cells available for parallel tasks andrequiring an even more fault-tolerant cell network. But each cellcontains a superset of the features needed to act as a pipeline, etc.for its own instruction set. Further embodiments therefore include theability for one cell to use its neighboring cells as independentpipelines or other accelerators to boost its serial instructionthroughput.

Because in most suitable spare cell interconnection schemes only a smallfraction of the spare cells are defective themselves or are used toreplace defective array cells, most of the perfectly good spare cellsare left over after forming the fault-free array of cells. These sparecells have numerous direct connections to other leftover spare cells, aswell as connections to the array and the array's busses. This makesthese left-over spare cells ideal for running serial tasks, as they havelots of direct connections to cells that can be used as acceleratorssuch independent pipelines, branch predictors, speculative executors,instruction prefetchers and decoders, etc. This should allow clusters ofsmall cells to match the throughput of complex mono-chip processorsoperating at the same clock speed. This also leaves the entire regulararray free to serve as a high-performance memory system or a parallelgraphics accelerator for the “serial processing” cell cluster, sooverall system throughput may actually be higher than conventionalsystems even on serial processing tasks. Further embodiments thereforeinclude means for a clusters of cells to cooperate when processing aserial task by using a plurality of cells as accelerators for that task.

The use of “left-over” spare cells can be extended in other ways.Although these cells do not form a regular array, they are linkedtogether in a network. This allows one cell to communicate withanother's data via any intermediate cells. While this does not have theperformance of direct addressability, it is none the less sufficient toallow one left-over cell to map the combined memories of other left-overcells into a contiguous medium-performance address space. This allowswhat might otherwise be wasted memory to be put to use as a RAN-disk,disk cache, I/O buffer and/or swap space for virtual memory. At today'slithography, this would amount to around 12 megabytes on a credit-cardsized system, and around 50 megabytes on a 6″ full-wafer system. Insteadof passing signals through intermediate cells, regional-data-busembodiments where power and heat are not critical issues could useintermediate performance bus-based addressing for the spare cells in theRAM disk, etc.

Computer displays can be built on wafers today, but these displays lackdefect tolerance, so every pixel and its support circuitry must befunctional or there will be an obvious “hole” in the array. Whilemillion-pixel arrays can be made defect free (although with persistentlylow yields), a wafer can hold many times that many pixels. The necessityfor perfection would, however, reduce yields of such arrays to nearzero. Because the human eye can handle orders of magnitude more pixelsthan today's displays use, advancements in lithography alone would beunlikely to solve this problem for many years. Previous fault tolerantarchitectures are not well suited for output arrays; the N-foldreplication schemes devote too small a fraction of the array's surfaceto active elements, and the more sophisticated cell-based schemmes havemultiple shifts, bounded only by the edge of the array, in array cellpositions (and hence pixel positions) for each defect handled.

The fault tolerant monolithic data processing architecture according toanother embodiment of the present invention therefore overcomes thedisplay resolution limit with an N-for-1 redundant monolithic network ofcells that can be organized into a large regular fault-free array ofcells, each of which has at least one optical sub-pixel (a color-displaymight have several sub-pixels per pixel), and where each array cell hasa plurality of physical neighbors that can directly replace itsfunctions without propagating the displacement to other cells, andwithout the overhead of N-fold replication of the array cells.Embodiments of the fault tolerant architecture of the present inventionas shown in FIGS. 1A, 1B, 1C, 1D and 1E produce regular arrays of cellsthat can handle high levels of defects with each defect merely shiftingthe functions of one cell to a spare neighboring cell. If the cells aresmall enough so that such a shift is not normally noticed by a human eye(approximately 50 microns at a normal reading distance), the defect isbypassed and the array can still be considered free from uncorrectablefaults in spite of one or more-defective pixels or sub-pixels. Severaltechnologies for fabricating pixels below the visible-optical-defectsize of 50 microns are already known in the art. Sony's visortron (“ . .. and visorTrons from Japan”, Popular Science, March, 1993) uses30-micron LCD sub-pixels, and Texas Instrument's Digital MicromirrorDevice (Mirrors on a chip, IEEE Spectrum, November 1993) uses17-micron-pixels. Other potentially suitable types of optical outputmeans include, but are by no means limited to, light emitting diodes,semi-conductor lasers and ultra-miniature cathode ray tubes, microscopicmirrors and field effect displays elements.

Even when the display is fabricated on the same substrate as other partsof the system, the display is essentially still a separate device forwhich data must be gathered and to which data must be sent. Havingnon-display regions on the same substrate as the display also reducespercentage of the substrate area that can be devoted to the display, atleast until production technology supports multiple layers of complexcircuitry (in contrast to memory and processing, larger physicaldimensions are often advantageous for a display). The fault tolerantarchitecture of the present invention, can support cells with a varietyof useful properties, allowing display, memory, and processor functionsall to be supported by the same spare cell scheme. Integrating thesystem's main memory array with its display array would be highlyadvantageous because this memory makes up the bulk of a typical system'scircuit count. Integrating this memory with the display array thusallows the display to cover most of the substrate area.

The fault tolerant monolithic data processing architecture according toanother embodiment of the present invention therefore integrates thedisplay and main memory for a system into a single array with a highlyredundant monolithic network of cells that can be organized into aregular fault-free array of cells, where the array cells contain bothone or more direct output elements and sufficient memory so that thearray as a whole contains at least half of the system's activesame-substrate memory. This can be accomplished without interfering withthe array's defective pixel tolerance by using a cell size less than thevisible-optical-defect limit of 50 microns. At the density of today's 16Mbit DRAM's, this would limit cell size to approximately 256 bits percell, with sufficient circuitry to support one pixel or 3 sub-pixels,and connections for a redundant scheme such as that shown in FIG. 1A.Due to the small cell size the raw cell defect rate should be under0.025$, even with a leading edge lithography. The 3-for-1 redundancyprovided by the spare cell arrangement of FIG. 1A is sufficient toprovide an extremely high yield at this low raw error rate. With 3 colorsub-pixels per cell, a 6-million-cell array would pack a8-times-better-than-SVGA display and 48 MBytes of fast memory onto asingle 8-inch wafer.

Arrays of larger cells would be more efficient in many cases than arraysof 50-micron or smaller cells because more of the area could be devotedto cell contents, as opposed to intercell connections for faulttolerance and to the rest of the system. In output arrays

Where the cell size exceeds the threshold for defects apparent to thehuman eye (or other receiving device), however, spare cells that havetheir own pixels will be obviously out of alignment when they replacearray cells. While the cells in previous display embodiments of thepresent invention can be made small enough to hide such defects, cellscontaining kilobytes of memory or RISC processors are far too large attoday's lithography for such a scheme.

The fault tolerant architecture of the present invention thereforeprovides a highly redundant network of cells that can be organized intoa regular fault-free array of cells, where the array cells contain oneor more direct output elements, and where spare cells 700′ have thecapability to control an array cell's display pixels when they replacethat array cell 700, as shown in FIG. 7A. This lets the array appearuniform to the eye (or other receiving device) even when defective arraycells are replaced by keeping the spare cell's output lined up with thecell that would normally have produced it. Cells larger than thevisible-optical-defect size can also have more processing power, whichallows more sophisticated compression schemes to be used. Sufficientprocessing power for a cell to FIGURE out which of its pixels fallwithin a triangle, for example, allows the array to process shadedtriangles directly rather than requiring the main CPU or a separategraphics accelerator process them, and sufficient processing power tohandle textures allows textured polygons to be used, etc.

With spare cells using the pixels of the cells they replace, however,the defective pixel tolerance is lost. While for some applications adefective output pixel would not be as serious as a defective processoror memory, in other applications the need to avoid defective pixelswould limit array size in the absence of defective-pixel tolerance. Forthese applications the previous embodiment is only useful for displaysthat can be made without defective pixels, which would currently limitthe display to a few million pixels. It would thus be extremelyadvantageous to restore the defective pixel tolerance for macroscopiccells.

The fault tolerant monolithic data processing architecture according toanother embodiment of the present invention therefore overcomes theoutput array size limit for arrays of macroscopic cells with a highlyredundant monolithic network of cells that can be organized into a largeregular fault-free array of cells where each cell has direct outputmeans including spare pixels as well as means for memory and/or meansfor processing. In order for spare pixels to be useful the maximumdistance between a spare pixel and the pixel it replaces must be smallenough so as not to cause an inconsistency noticeable to the receiver.For the human eye at a comfortable viewing distance, this is around1/500 of an inch (0.05 mm), although with a blurring mask

0.1 mm would be acceptable. The architecture disclosed in the presentinvention can support output to vast numbers of pixels, and displayswith pixels smaller than 1/500 inch are already in production. With thefault tolerance that the architecture of the present invention supplies,it is anticipated that pixels could be made as small as the memory thatcontrols them. A typical implementation with today's lithography woulduse cells that nominally have 4096 pixels arranged in a 64×64 matrix,but actually have 72×72 pixels, with the pixels addressed by row andcolumn pixel lines in a manner similar to the word and bit lines ofmemory chips. During normal operation, each 9th line would be an “extra”line. The extra lines could be programmed to be blank, leading to abarely noticeable “stippled” effect, or to display the average of theirneighboring lines at every point, producing a smoother looking display,or even to alternate between their neighboring lines' values. Whenreplacing a line containing a defective pixel, the nearest spare linewould take on its neighbor's values, leaving that line free to in turntake on its neighbor's values, until the defective line was reached.With the example above and 0.05 mm pixels, this would cause a 0.05 mmshift in the pixels in a region 3.6 mm by 0.05-0.2 mm, which isunnoticeable to the unaided eye from a normal viewing distance. Thisprovides a display many orders of magnitude more error tolerant thantoday's absolute-perfection-required displays. The length of the shiftedarea can be halved when necessary by dividing a cell's direct outputpixels into quadrants with control circuitry around the perimeterinstead of on just two sides. It is also be possible to use a somewhatmore sophisticated pixel-level fault tolerant scheme. While the faulttolerant scheme of U.S. Pat. No. 5,065,308 is not suitable for the cellarray as a whole, it could easily be adapted to provide fault tolerancefor each individual cell's pixels by treating each pixel as one of itscells. With 0.5 micron lithography this would, unfortunately, consumeroughly ⅓ of the cell's total circuit count, but improvements inlithography should reduce this to an acceptable fraction within in lessthan a decade.

Although these spare pixel schemes do have multiple pixel shifts perdefective pixel, the shifts are only the length of a single pixelinstead of the length of a whole cell, and the shifts are bounded by thenearest spare line or the relatively nearby edge of the cell rather thanby the potentially far more distant edge of the whole array.

While for many type of I/O the advantages of direct I/O from each cellare overwhelming, this does not preclude adding means for other types ofI/O, especially those whose resolution is on the scale of a whole arrayor larger rather than that of an individual cell, to the cell network asa whole as opposed to each cell with rectangular arrays on round wafersthis can be a good use for the considerable space around the edges ofthe arrays. Types of I/O suitable for this include, but are not limitedto, acceleration, position and orientation detectors, sonic detectors,infrared or radio signal detectors, temperature detectors, magneticfield detectors, chemical concentration detectors, etc.

Systems that use wireless links to communicate with external devices arewell known in the art. Cordless data transmission devices, includingkeyboards and mice, hand-held computer to desktop computer data links,remote controls, and portable phones are increasing in use every day.But increased use of such links and increases in their range and datatransfer rates are all increasing their demands for bandwidth. Someelectromagnetic frequency ranges are already crowded, making thistransmission bottleneck increasingly a limiting factor. Powerrequirements also limit the range of such systems and often require thetransmitter to be physically pointed at the receiver for reliabletransmission to occur.

The fault tolerant monolithic data processing architecture according toanother embodiment of the present invention overcomes the output arraysize limit with a highly redundant monolithic network of cells that canbe organized into a large regular fault-free array of cells where eachcell has means for input and output to a global data bus and directinput and/or output means as well as means for memory, and means forprocessing, and means for coordinating the phase and/or timing of thecell's direct inputs and/or outputs with those of other array cells.This allows the array of cells 1300 to act as a “phased array” forfocusing on an external transmitter or receiver 135, as shown in FIG.13. Spare cells that replace array cells in such an architecture can beuseful in receiving or transmitting if they either have their owntiming/phase control means or they use the replaced array cell'stransmitting or receiving means 1304 (or if the maximum distance betweena spare cell and the cell it replaces is small enough so as not to causean inconsistency that interferes with reception or transmission).Because phased arrays by their nature involve sending or receiving thesame signal through many cells, it is convenient to have the cellscommunicate through a global or regional data bus.

A further embodiment dynamically focuses on the external device througha differential timing circuit. For direct outputs whose signalpropagation is slow compared to the speed of the global data bus, suchas sonic direct output elements receiving data from an electronic bus, asimple way to implement the differential timing circuits is as follows:One cell (or a device associated with the array) is the target or sourceof the signal to be focused. This cell or device will be referred to asthe controller. The external device to be focused on sends a shortreference signal strong enough for each array cell to pick individually.When the controller picks up this signal, it waits long enough so thatall the cells will have received it, and then sends its own referencesignal across the global data bus. Each cell measures the delay timebetween when it receives the external reference signal and the referencesignal on the global data bus. When all the cells receive data to betransmitted from the global data bus, each cell delays for its delaytime before transmitting that data. The cells that received the externalreference signal later have shorter delay times, and thus send the dataearlier. This causes the transmissions from all the cells to arrive atthe external device simultaneously and in phase, effectively focusingthe overall transmission upon it, as shown in the solid-line waves 1343.The cells transmissions will not add constructively, and hence will notfocus, at most other points 135′, 4 s shown by the dashed line waves1343′ (the cell timing delay difference for one cell is indicated byidentical-length segments 1344).

The same timing works when the cells receive data, too. Each cell delays(by its delay time) before putting received data on the global bus, socells that receive their data later delay shorter times and all signalsfrom the source get added together on the bus. With signals from sourcesother than the one being focused on, the signals do not all arrive inphase, so their effect is much reduced. When receiving data, once thefocusing is established it can be maintained even if the external devicemoves by each cell checking its timing against the collective globalsignal. This focusing should lead to vast improvements in areas such asvoice input to computers, which currently suffers from a very difficulttime picking out a given voice from background noise. With a dynamicallyfocusing array to receive the sound input and a processor array tointerpret it, computer speech recognition should be practical in a widevariety of real-world situations.

This phased array technique can also be adapted to direct outputs whoseexternal signal propagation speed is comparable to or greater than thatof signal propagation on the global bus, such as radio transmission.First the timing of the global bus must be taken into consideration. Ifthe same cell or device is always the controller, the time for data toreach a given cell is a constant—that can be controlled at manufacturingtime; probably the easiest way is to provide paths of equal length toevery cell, either for the global data bus or for a separate timingsignal. If the global bus timing cannot be compensated for atmanufacturing time, an arrays containing an orientation detector cancalculate the bus timing for each cell by comparing calculated delaytimes for various orientations (the bus timing remains constantregardless of orientation, while the propagation timing does not). Forelectromagnetic radiation, however, the required delay times are toosmall for any current technology, but the phase angle of the output canbe controlled instead. This is most effective for frequencies whosewavelength is at least twice the width of a single cell, but less thanfour times the width of the entire array. For wafer sized or largerarrays and electromagnetic radiation, this covers the VHF and UHF TVbands. Arrays smaller than a credit card would achieve only limitedfocusing of VHF signals, but would still work well in the UHF band. Anespecially preferred embodiment would combine direct phased arrayreceiving means for such signals with sufficient processing power todecode standard TV or HDTV signals and sufficient optical outputs todisplay a complete standard TV or HDTV picture, as this creates acompact, low-cost, low-power, monolithic TV system.

One of the most important kinds of data to focus, however, is opticaldata, and the frequency of optical signals is so high that even directphase control for focusing is currently impractical. Directional controlof optical signals, however, is practical. For constant focusing it iseasy to mould a pattern of small lenses on a plastic sheet that can formthe surface of an output or input array, as is done in SONY's Visortron.This is especially useful for head-mounted arrays because these can beheld at constant, pre-determined orientation and distance from theviewer's eyes, and because they can be close enough to have each cell'spixels visible by only one eye, eliminating the need for a single cellto direct different images to different eyes. For non-head-mounteddisplays, fixed-focusing can be used to allow images to have someapparent depth as long as the display is held at approximately the rightdistance and orientation) by having different pixels directed towardeach eye.

Dynamic focusing, however, has numerous advantages over fixed focusing.For non-head-mounted displays, adding directional control to the cells'optical outputs allows the array to present a stereoscopic imageregardless of viewing angle and distance. Control of focal length iseven more advantageous, as it allows displays, whether head-mounted ornot, to “almost focus” in such a manner that the receiving eye's naturalfocusing will causes the eye to “see” those pixels as being at a givendistance, thus producing true 3-dimensional images as far as the eye cantell. Further embodiments of the present invention therefore includemeans for optical input and/or output in each cell along with means forthat input and/or output to be dynamically focused. This can beaccomplished through holographic lenses, which have been pioneered for3-dimensional optical storage systems (“Terabyte Memories with the Speedof Light”, BYTE, March 1992). Because each cell can have enoughprocessing power to control a holographic lens to focus on a givenpoint, the array as a whole can focus on that point. Since each cell canfocus independently, separate regions of the array can also focus ondifferent points. While holographic lenses are likely to prove mostpractical in the short run, other focusing methods would be applicable.A fly's eye, for example, uses physical deformation of a gelatinous lensto focus each cell on a point of interest to the fly, and a similarscheme on a wafer could use cantilevered silicon beams or piezoelectricmaterials deformed by electrical forces.

Current computer systems are made from a number of separatelymanufactured components connected together and placed inside a plasticor metal box for protection. This creates a system many orders ofmagnitude bigger than the components themselves. But the presentarchitecture allows all lithographically fabricated components, frominput and output to memory and processors, to be integrated on a singlesubstrate, leaving only the power supply and mass storage systems asseparate devices. Because the present architecture reduces powerconsumption, it should be feasible to power a system based on it throughbatteries and/or photovoltaic means. Both thin-film photovoltaic cellsand thin high-performance lithium batteries can be produced on waferproduction lines (“Thin-film Lithium Battery Aims at World ofMicroelectronics”, Electronic Products, December 1992), allowing theirintegration into the architecture of the current invention with today'stechnology. It is also possible to lithographically fabricate anindividual battery (or other power storage means) and/or photovoltaicmeans for each cell so that ALL system components have at least the samecell-level redundancy and no fault will interfere with the properoperation of more than a few directly replaceable cells. In suchembodiments it would advantageous for cells to be able to join withtheir non-defective neighbor in a regional power-sharing bus. In anideal embodiment ambient light that was not reflected as part of thedirect output would be absorbed by a photovoltaic cell, and the systemwould go into a power-absorbing standby mode when left idle for a givenperiod of time. If equipped with sufficient photovoltaic receptor area,a carefully designed array could be powered entirely by ambient light,eliminating the need for external power supplies and creating acompletely self-contained monolithic system, although it is expectedthat in practice additional global connections for an external powersource will be advantageous in most cases.

While systems based on the previous embodiments of the present inventionrepresent significant advances in input, processing, memory, and output,semiconductor wafers are fragile and limited in size. It is, however,possible to transfer a thin layer of crystalline silicon includingcompleted circuitry from the surface of a wafer to another substrate,including a flexible one such as a tough plastic (“Prototype YieldsLower-Cost, Higher Performance AMLCDs”, Electronic Products, July 1993,and “Breaking Japan's Lock on LCD Technology”. The Wall Street Journal,June 1993). By placing a plurality of such transfers contiguously onto alarge semi-rigid substrate, and then interconnecting the transfersthrough alignment insensitive contacts (such as those shown in FIG. 4E)in a final metal layer, a system of any size needed could be produced.If such a system were covered with a protective plastic layer, the wholesystem would be a extremely tough and durable. Because the presentinvention teaches integrating an entire system on the surface of awafer, circuit transfer will allow an entire system according to thecurrent invention to be reduced to a tough, durable, light-weight sheetas thin as a fraction of a millimeter, although sheets approximately asthick and stiff as a credit card are expected to be ideal for most uses.

A further embodiment of the fault tolerant monolithic data processingarchitecture of the present invention therefore overcomes the wafer sizelimit with a plurality of highly redundant monolithic networks of cellsthat can each be organized into a large regular fault-free array ofcells where each cell has direct optical output means as well as meansfor memory and processing, and where the monolithic networks are affixedclose to each other on a substrate and the networks are subsequentlyconnected to each other to extend the inter-cell connection patternsacross the inter-network boundaries. More preferred embodiments use anon-fragile substrate. Although the inter-transfer connections can onlybe made on one metal layer instead of the up to five metal layerscurrently practical within a given transfer, an order of magnitude moreconnections can still be made to one side of a 3 mm cell as off-chipconnections can be made to the whole perimeter of astandard-architecture 15 mm chip. Arrays based on the present inventionshould be ideal candidates for such transfers because their defecttolerance allows them to survive rougher handling than traditionalcircuitry circuit transfer will also be useful in adding additional thinmemory- or processing layers to systems built according to the presentarchitecture. This is expected to be especially useful in addingmultiple low-power memory layers to compact diskless systems.

Current wafer based production systems are efficient for producingmonolithic regions no bigger than wafers, but the architecture disclosedin the present invention can efficiently handle networks far bigger thana wafer. But circuit transfer techniques can be used for raw silicon aswell as for completed circuits, so large areas of a substrate can becovered with monolithic transfers of crystalline silicon with only thinlines of inconsistencies between the transfers. By trimming and placingthe transfers to 1/500 inch (50 micron) accuracy (the visible defectlimit for the human eye) and bridging the inter-transfer gaps by metallayers during the fabrication process, these seams can be hidden betweenthe cells. The architecture disclosed in the present invention letscells or regions of cells be connected through alignment-insensitivecontacts, allowing regions larger than a single production-line mask tobe fabricated, and allowing multiple low-cost masks to be applied eithersequentially or simultaneously. It is thus possible to perform allproduction steps for systems based on the architecture of the presentinvention, including lithography, on a production line based on a largeor a continuous sheet of substrate, rather than on individual wafers.Similar production lines are currently used in the manufacture ofcontinuous sheets of thin-film solar cells, although not withtransferred crystalline silicon. Because of economies of scale, suchcontinuous linear production should be far cheaper than individual-waferbased production and subsequent circuit transfer.

Portability is an increasingly important issue in computer systems. Byintegrating an entire data processing. System in a microscopicallyinterconnected region, the present invention greatly reduces the size,cost, and power requirements of the system. Such regions can also befabricated on or transferred to flexible substrates, allowing completeone-piece computer systems to be built on non-fragile substrates. Whenprovided with a thin, transparent protective surface layer, such asystem can be extremely rugged, being essentially shockproof andpotentially even waterproof, as, well as being compact.

In exceptionally preferred embodiments of the present invention, theentire network of cells of any of the embodiments described previouslyis therefore fabricated as a single thin flexible sheet. This can beachieved by fabricating the array on a thin plastic substrate onto whichthin semiconductor and other layers are deposited or transferred. In theexample shown in FIGS. 14A and 14B, the data processing system 140 isfabricated as follows: Layer 1460 is smooth sheet of fairly stiffplastic (LSXAN, for example) around 150 microns (6 mils) thick. Athin-film lithium battery layer 1461 400 microns thick is depositednext, followed by a few-micron layer of plastic or other insulator, suchas sputtered quartz. The battery of single cell 1400 is shown in FIG.14A as battery 1440. A few-micron aluminum power distribution layer 1462is created next, followed by another insulating layer. A small hole foreach cell is etched (or drilled, etc.) through to the power layer, and avertical “wire” is deposited inside to give the cell access to the powerlayer. Next the processor/memory layer 1463 is built: A layer ofsemiconductor material around 50 microns thick is deposited ortransferred, and is doped through a low-temperature doping system (suchas ion implant) in a manner similar to standard integrated circuitfabrication. Metalized layers are used to connect the elements in theprocessor/memory layer in the standard integrated circuit chip manner(except for connections to power and ground). This layer contains thebulk of the cells' circuitry, including input and output means 1402 to aglobal data bus, means 1418 for communication with neighboring cells,memory 1416, and processor 1420, and optional means 1436 to join aregional data bus. Next a layer of insulator is deposited everywhereexcept where connections to the ground layer will go. The ground layer1464 is created in the same manner as the power layer 1462. Holes are“drilled” through to contacts in the processor/memory layer andinsulated vertical “wires” are deposited inside these holes to give theprocessor/memory layer 1463 access to the direct I/O layer 1465. Thisdirect I/O layer 1465 is added next, with the direct optical outputs1404 fabricated in a manner similar to any of those used in making apixels on a flat-panel portable computer display, the direct opticalinputs 1424 fabricated in a manner similar to that used in making a CCDinput chip, and the touch/proximity direct inputs 1430 fabricated asminiature standard capacitance touch/proximity detectors. All of thesetechniques are well known in the art. This layer can also contain sonicoutput means 1432 and sonic input means 1434. The top layer 1466 is aclear protective layer—100 microns of LEXAN (polycarbonate) providesscratch resistance and brings the total thickness up to around 800microns, or 0.8 mm. Thus the entire system 140 in this implementation isa stiff but not brittle sheet under a millimeter thick. When usingcontinuous production techniques a large sheet built according to thepresent embodiment would be diced into a series of smaller sheets, withcredit card sized systems and 8½″×11″ systems expected to beexceptionally useful.

Small systems built this way should also be perfect for virtual realityglasses. Consider a current computer system with desktop metaphorsoftware such as MS Windows, OS/2, System 7, etc. The “desktop” space islimited by the size of a monitor to far less than a real desktop. Withthis embodiment of the architecture of the present invention, suchglasses will have more memory, better resolution, and far moreprocessing power than a current desktop system. Furthermore, the leftand right “lenses” can display stereoscopic images, and, if the glassesincorporated means for acceleration or orientation detection, the entireimage can shift as the wearer's head turns. This could be used to createa whole “virtual office” metaphor far more useful than the “virtualdesktop” metaphor of today's computer systems. The glasses can alsoinclude means (such as infra-red receivers) for communication with otherelectronic equipment (such as a data gloves, a keyboard, etc.), orphysical connections to an external power supply. Because systems builtaccording to this embodiment are extremely portable, it is advantageousdesign all of the elements for minimal power consumption (i.e.non-volatile SRAMs instead of DRAMS). While different orderings of thelayers can be used, the ordering chosen for this example has someimportant advantages. The processor/memory layer is sandwiched directlybetween the power and ground layers for fast and easy access to power,which speeds up processing and reduces power requirements. Also, theground layer and the power layer shield the sensitive processor/memorylayer from external electromagnetic interference.

All examples used in this patent application are to be taken asillustrative and not as limiting. As will be apparent to those skilledin the art, numerous modifichtions to the examples given above can bemade within the scope and spirit of the invention. While flatrectilinear arrays have been shown for simplicity, cells can beconnected in triangular, hexagonal, octagonal or other regularconfigurations (although these are less useful for memory arrays). Suchconfigurations need not be planar—the inner surface of a sphere, forexample, can be covered with cells that can communicate optically withany other cell across the sphere without interfering with the rest ofthe array. It is also-possible to use layers of cells with directconnections to input and output elements on the surface, or to use threedimensional arrays of cells where only the surface cells have directoutput capabilities. One way to achieve this effect with planar arraysis to have complementary direct inputs and outputs on both faces of thearray so that separate arrays can be stacked into a 3-dimensional arrayprocessor of incredible speed.

Although today's silicon lithography has been used for easyunderstanding in the examples, the elements in and principles of thepresent invention are not limited to today's lithography, to silicon, tosemi-conductors in general, or even to electronics. An optical processorand memory array could be very conveniently coupled to direct opticalinputs and outputs, for example. Nor are the cells' elements limited tobinary or even digital systems. A hybrid system where each cell hadanalog input and analog connections to neighbors in addition to digitalprocessing, memory, and direct output appears to be very promising forreal-time vision recognition systems. It is also possible to have morethan one processor per cell, such as transputer based cells withseparate message passing processors.

Nor are the sizes or quantities used in the examples to be taken asmaxima or minima, except where explicitly stated. For example, thedisclosed architecture can pack a massively parallel computer into acontact lens and also support a multi-billion-cell array the size of amovie theater screen with equal ease.

I claimed:
 1. A fault tolerant data processing architecture comprising:a monolithic network of cells having array cells and spare cellsinterconnected in such a manner that a plurality of spare cells candirectly replace functions of any given array cell of the network shouldthat given array cell prove defective without an overhead of a pluralityof dedicated replacement cells for each array cell; and means forself-testing, wherein said means for self testing further comprise meansfor array cells that test valid to vote to assassinate a defectiveneighbor cell by disconnecting a power supply of the defective neighborcell.
 2. A method for directly replacing defective array cells in afault tolerant architecture, comprising steps of: testing each arraycell; selecting a respective spare cell for each defective array cell;and replacing the functions of the defective array cell with therespective spare cell, wherein the step of testing each array cellcomprises mapping a density of defective array cells surrounding eacharray cell.
 3. The method as claimed in claim 2 wherein functions ofdefective cells are replaced in an order starting with a defective cellhaving a highest surrounding defect density and proceeding toward adefective cell having a lowest surrounding defect density.
 4. The methodas claimed in claim 2 wherein functions of defective cells are replacedin an order starting with a defective cell having a lowest number ofunassigned spare cells and proceeding toward a defective cell having ahighest number of unassigned spare cells.
 5. A fault tolerant dataprocessing architecture comprising: a monolithic network of array cellsand spare cells fabricated on a substrate that can be organized into afault-free array of cells; means for directly addressing each cell ofthe fault-free array of cells; and means for each cell of the fault-freearray of cells to send and receive data via a global data bus, whereineach cell of the array of cells further comprises a plurality of memorybits addressable by word lines and bit lines, each of the memory bitscomprising at least one spare bit-lines for tolerating a defective bitwithout requiring replacing the cell by a spare cell.
 6. Thearchitecture as claimed in claim 5 further comprising functions selectedfrom the group consisting of BIOS (basic input output system) chips,video accelerators and input/output controllers fabricated on thesubstrate.
 7. A fault tolerant data processing architecture comprising:a monolithic network of array cells and spare cells fabricated on asubstrate that can be organized into a fault-free array of cells; meansfor directly addressing each cell of the fault-free array of cells; andmeans for each cell of the fault-free array of cells to send and receivedata via a global data bus, wherein each cell of the array of cellsfurther comprises a plurality of memory bits addressable by word linesand bit lines, each of the memory bits comprising at least one spareword-line for tolerating a defective bit without requiring replacing thecell by a spare cell.
 8. A fault tolerant monolithic data processingarchitecture comprising: a network of cells containing memory andprocessors that can be organized into a regular fault-free array ofcells, wherein the array of cells provides a parallel processing array,and wherein the cells further comprise: registers and local cachememory; and means for using a register and/or a local cache memory of anarray cell as a cache memory of a processor of another array cell.
 9. Afault tolerant data processing architecture comprising: a network ofcells containing memory and processors that can be organized into afault-free array; means for communication between neighboring cells; andmeans for input and output to a global data bus, wherein means forcommunication between neighboring cells comprises memory means placedbetween the neighboring cells and shared by the neighboring cells. 10.The architecture as claimed in claim 9 wherein means for communicationbetween neighboring cells further comprises alignment insensitivecontacts.
 11. A fault tolerant data processing architecture comprising:a network of cells containing memory and processors that can beorganized into a fault-free array; means for communication betweenneighboring cells; means for input and output to a global data bus; andmeans for a spare cell replacing a defective cell to copy the defectivecell's memory whereby enabling dynamic recovery from apost-manufacturing defect is enabled.
 12. A fault tolerant dataprocessing architecture comprising a network of cells containing memory,processors and a direct output element that can be organized into afault-free array, wherein the memory and processors are capable ofextracting output data for the direct output from a compressed datastream.
 13. A fault tolerant data processing architecture comprising: aredundant monolithic network of cells that can be organized into aregular fault-free array of cells, wherein each cell has input andoutput means to a global data bus; means for input and outputcommunication with neighboring cells in a plurality of dimensions;sufficient memory and processing power to decompress a data stream andto emulate at least one instruction from a microprocessor instructionset; full color direct output means; full color, capacitancetouch/proximity direct input means; and means to join a regional databus.